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having persistent storage for data signals, 
plural computers, each having an interface, 
coupled to said data network, for exchanging 
data signals between said plural computers, 
and 5 
a shared memory subsystem, coupled to said 
data network, for assigning a portion of said ad- 
dressable memory space to a portion of said 
persistent storage of said hard disk to provide 
thereby addressable persistent storage for data io 
signals. 



30. A computer system as claimed in claim 29 further 
comprising 

a volatile memory device providing volatile is 
storage for storing data signals, and wherein said 
shared memory subsystem includes means for 
mapping a portion of said addressable memory 
space to a portion of said volatile storage. 

20 

31. A computer system as claimed in claim 29 further 
comprising, 

a page generator for generating a directory page 
that carries information representative of a location 
monitor that tracks a data storage location, to pro- 2S 
vide a directory structure for tracking homeless da- 
ta. 

32. A computer system as claimed in claim 31 wherein 
said data storage location stores information repre- 30 
sentative of a directory page, to store said directory 
structure as pages of homeless data. 



33. A method for providing a computer system having 
a shared addressable memory space, comprising 35 
the steps of 

providing a network for carrying data signals 
representative of computer readable informa- 
tion, 40 
providing a hard-disk, coupled to said network, 
and having persistent storage for data signals, 
providing plural computers, each having an in- 
terface, coupled to said data network, for ex- 
changing data signals between said pi u ral com- *s 
puters, and 

assigning a portion of said addressable mem- 
ory space to a portion of said persistent storage 
of said hard disk to provide addressable per- 
sistent storage for data signals. so 
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limit d to the illustrat d embodiments and is to b un- tw 
derstood by the claims set forth below. plura 



Claims 

1. A computer system having a shared addressable 
memory space, comprising 

a data network for carrying data signals repre- 
sentative of computer readable information, 
a persistent memory device, coupled to said 
data network and having persistent storage for 
data signals; 

plural computers, each having 
an interface, coupled to the data network, for 
accessing said data network to exchange data 
signals therewith, and 

a shared memory subsystem for mapping a 
portion of said addressable memory space to a 
'portion of said persistent storage to provide 
thereby addressable persistent storage for data 
signals. 

2. A computer system as claimed in claim I wherein 
said persistent memory device comprises a plurality 
of local persistent memory devices each coupled to 
a respective one of said plural computers. 

3. A computer system as claimed in claim 2 further 
comprising 

a distributor for mapping portions of said ad- 
dressable memory space across said plurality of lo- 
cal persistent memory devices, to provide an ad- 
dressable memory space distributed across said lo- 
cal persistent storage of said computers. 

4. A computer system as claimed in claim 3 further 
comprising 

a disk directory manager for tracking said 
mapped portions of said addressable memory 
space to provide information representative of said 
local persistent memory device having said portion 
of said addressable memory space mapped there- 
on. 

5. A computer system as claimed in claim 2 further 
comprising 

a cache system for operating one of said local 
persistent memory devices as a cache memory for 
cache storing data signals associated with recently 
accessed portions of said addressable memory 
space. 

6. A computer syst m as claimed in claim 2 further 
comprising 

a migration controller for selectively moving 
portions of said addressable memory space be- 



20 

n said local persist nt memory devic s of said 
il computers. 

7. A computer system as claimed in claim 2 further 
s comprising 

a replication controller for generating a copy 
of a portion of said addressable memory space 
maintained in said local persistent memory device 
of a first computer and for storing said copy in said 
10 local persistent memory device of a second compu- 
ter. 

8. A computer system as claimed in claim 1 further 
comprising 

15 

a volatile memory device having volatile stor- 
age for data signals, and 
wherein said shared memory subsystem in- 
cludes means for mapping a portion of said ad- 
20 dressable memory opace io a pcrtiori of said 

volatile storage. 

9. A computer system as claimed in claim 8 wherein 

25 said volatile memory device comprises a plu- 

rality of local volatile memory devices each cou- 
pled to a respective one of said plural comput- 
ers, and 

said persistent memory device comprises a 
30 plurality of local persistent memory devices 

each coupled to a respective one of said plural 
computers. 

10. A computer system as claimed in claim 9 further 
35 comprising a distributor, coupled to said data net- 
work, for mapping portions of said addressable 
memory space across said storage provided by said 
local persistent and volatile memory devices. 

40 11. A computer system as claimed in claim 10 further 
comprising 

a directory manager for tracking said mapped 
portions of said addressable memory space. 

45 12. A computer system as claimed in claim 10 further 
comprising 

a disk directory manager for tracking portions 
of said addressable memory space mapped to said 
local persistent memory devices. 

so 

13. A computer system as claimed in claim 11 wherein 
said directory manager includes 

a RAM directory manager for tracking por- 
tions of said addressable m mory spac mapped 
55 to said local volatile memory devices. 

14. A computer system as claim d in claim 9 further 
comprising 
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a RAM cache system for operating one of said 
local volatile memory devices as a cache memory 
for cache storing data signals associated with re- 
cently accessed portions of said addressable mem- 
ory space. 

15. A computer system as claimed in claim 9 further 
comprising 

a paging element for remapping a portion of 
said addressable memory space between one of 
said local volatile memory devices and one of said 
local persistent memory devices. 

16. A computer system as claimed in claim 15 further 
comprising 

a policy controller for determining a resource 
available signal representative of storage available 
on each of said plural computers and, and wherein 
said paging element remaps said portion of ad- 
dressable memory space from a memory device of 
a first computer to a memory device of a second 
computer, responsive to said resource available 
signal. 

17. A computer system as claimed in claim 9 further 
comprising 

a migration controller for moving portions of 
addressable memory space between said local vol- 
atile memory devices of said plural computers. 

18. A computer system as claimed in claim 9 further 
comprising 

a hierarchy manager for organizing said plural 
computers into a set of hierarchical groups wherein 
each group includes at least one of said plural com- 
puters. 

19. A computer system as claimed in claim 18 wherein 
each said group includes 

a group memory manager for migrating por- 
tions of addressable memory space as a function 
of said hierarchical groups. 

20. A computer system as claimed in claim 9 further 
comprising 

a coherent replication controller for generat- 
ing a coherent copy of a portion of addressable 
memory space. 

21. A computer system as claimed in claim 1 further 
comprising 

an address generator for generating a global 
address signal representative of a portion of ad- 
dressable memory space. 

22. A computer syst m as claimed in claim 21 , wh rein 
said address g nerator includes a spanning unit for 
generating global address signals as a function of 



a storage capacity associat d with said persistent 
memory d vices, to provide global address signals 
capable of logically addressing said storage capac- 
ity of said persistent memory devic s. 

5 

23. A computer system as claimed in claim 3 further 
comprising 

a distributed directory manager for storing 
within said distributed memory space, a directory 
10 signal representative of a storage location of a por- 
tion of said addressable memory space. 

24. A computer system as claimed in claim 23 wherein 
said distributed 

15 

directory manager includes 
a directory page generator for allocating a por- 
tion of said addressable memory space and for 
storing therein an entry signal representative of 
20 a portion of said directory signal. 

25. A computer system as claimed in claim 24 wherein 
said directory page generator includes 

25 a range generator for generating a range signal 

representative of a portion 
of said addressable memory space, and for 
generating said entry signal responsive to said 
range signal, to provide an entry signal repre- 
ss sentative of a portion of said directory signal 
that corresponds to said portion of said ad- 
dressable memory space. 

26. A computer system as claimed in claim 25 wherein 
35 said distributed directory manager includes a link- 
ing system for linking said directory pages to form 
a hierarchical data structure of said linked directory 
pages. 

40 27. A computer system as claimed in claim 25 wherein 
said distributed directory manager includes a range 
linking system for linking said directory pages, as a 
function of said range signal, to form a hierarchical 
data structure of linked directory pages. 

45 

28. A computer system as claimed in claim 24 wherein 
said directory page generator includes a node se- 
lector for generating a responsible node signal rep- 
resentative of a select one of said plural computers 

s ° having location information for a portion of said 
shared address space. 

29. A computer system having a shared addressable 
memory space, comprising 

55 

a data n twork for carrying data signals r pr - 
sentative of comput r readable information, 
a hard-disk, coupled to said data network, and 
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information including a v rsion tag representative of the 
version of the data, a dirty bit r presentativ of whether 
the RAM cached data is a copy of the data held on disk, 
or whether the RAM cached data has been modified but 
not yet flushed to disk, a volatile bit to indicate if the page 
is backed by backing store in persistent memory, and 
other such attribute information useful for managing the 
coherency of the stored data. 

In the embodiment depicted in FIG. 4, the memory 
subsystem 70 provides the node access to the distrib- 
uted memory space by the coordinated operation of the 
directory manager that includes the global RAM direc- 
tory 80 and the global disk directory 84, the cache con- 
troller that includes the local RAM cache and the local 
disk cache elements 76 and 94, and the copyset ele- 
ments which include the RAM copyset 78 and the disk 
copyset 82. 

The directory manager provides a directory struc- 
ture that indexes the shared address space. Continuing 
with the example of a paged shared address space , the 
directory manager of .the subsystem 70 allows the host 
node to access, by global addresses, pages of the 
shared memory space. 

FIGs. 5 and 6 illustrate one example of a directory 
structure that provides access to the shared memory 
space. FIG. 5 depicts a directory page 1 20 that includes 
a page header 122, directory entries 124 and 126, 
wherein each directory entry includes a range field 1 30, 
a responsible node field 1 32, and an address field 1 34. 
The directory pages can be generated by a directory 
page generator that can be a software module control- 
led by the directory manager. It will be understood that 
the directory manager can generate multiple directories, 
including one for the Global disk and one for the Global 
RAM directories. The depicted directory page 1 20 can 
be a page of the global address space, such as a 4K 
byte portion of the shared address space. Therefore, the 
directory page can be stored in the distributed shared 
memory space just as the other pages to which the di- 
rectory pages provide access. 

As further depicted in FIG. 5, each directory page 
120 includes a page header 122 that includes attribute 
information for that page header, which is typically meta- 
data for the directory page, and further includes direc- 
tory entries such as the depicted directory entries, 124 
and 126, which provide an index into a portion of the 
shared address space wherein that portion can be one 
or more pages, including all the pages of the distributed 
shared memory space. The depicted directory page 1 20 
includes directory entries that index a selected range of 
global addresses of the shared memory space. To this 
end, the directory generator can include a range gener- 
ator so that each directory entry can include a range field 
1 30 that describ s th start of a rang of address s that 
that entry locates. 

Accordingly, each dir ctory page 1 20 can include a 
plurality of directory ntries, such as entries 124 and 
1 26, that can subdivide the address space into a subset 
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of address ranges. For xample, the depicted directory 
page 120 includes two directory entries 124 and 126. 
The directory entries 124 and 126 can, for example, sub- 
divide the address space into two sub-portions. In this 
s example, the start address range of the directory entry 
124 could be the base address of the address space, 
and the start address range of the directory entry 126 
could be the address for the upper half of the memory 
space. Accordingly, the directory entry 1 24 provides an 
io index for pages stored in the address space between 
the base address and up to the mid-point of the memory 
space and, in complement thereto, the directory entry 
126 provides an index to pages stored in the address 
space that ranges from the mid-point of the address 
is space to the highest address. 

FIG. 5 further depicts a directory page 120 that in- 
cludes, in each directory entry, a responsible node field 
132 and the child page global address field 1 34. These 
fields 132, 134 provide further location information for 
20 the data stored in pages wiihin the address range iden- 
tified in field 1 30. 

FIG. 6 depicts a directory 1 40 formed from directory 
pages similar to those depicted in FIG. 5. FIG. 6 depicts 
that the directory 140 includes directory pages 142, 
25 150-154, and 160-166. FIG. 6 further depicts that the 
directory 1 40 provides location information to the pages 
of the distributed shared memory space depicted in FIG. 
6 as pages 170-184. 

The directory page 142 depicted in FIG. 6 acts like 
30 a root directory page and can be located at a static ad- 
dress that is known to each node coupled to the distrib- 
uted address space. The root directory page 142 in- 
cludes three directory entries 144, 146, and 148. Each 
directory entry depicted in FIG. 6 has directory entries 
35 similar to those depicted in FIG. 5. For example, direc- 
tory entry 144 includes a variable Co which represents 
the address range field 1 30, a variable Nj representative 
of the field 1 32, and a variable Cs representative of the 
field 134. The depicted root directory page 142 subdi- 
40 vides the address space into three ranges illustrated as 
an address range that extends between the address Co 
and Cd, a second address range that extends between 
the address Cd and Cg, and a third address range that 
extends between Cg and the highest memory location 
45 of the address space. 

As further depicted in FIG. 6, each directory entry 
144, 146, and 148 points to a subordinate directory 
page, depicted as directory pages 150, 152, and 154, 
each of which further subdivides the address range in- 
50 dex by the associated directory entry of the root direc- 
tory 142. In FIG. 6, this subdivision process continues 
as each of the directory pages 150, 152, and 154 each 
again have directory entries that locate subordinate di- 
rectory pag s including th d pict d xampl s of dir c- 
55 tory pag s 160, 162, 164, and 166. 

The depicted example of directory pages 160, 162, 
164, and 166 are each leaf entries. The leaf entries con- 
tain directory entries such as the directory entries 156 
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and 158 of the I af entry 160, that stor a rang fi Id 
1 30 and the responsibl node field 1 32. These leaf en- 
tries identify an address and a responsible node for the 
page in the distributed memory space that is being ac- 
cessed, such as the depicted pages 170-184. For ex- 
ample, as depicted in FIG. 6, the leaf entry 156 points 
to the page 1 70 that corresponds to the range field 1 30 
of the leaf entry 156, which for a leaf entry is the page 
being accessed. In this way, the directory structure 140 
provides location information for pages stored in the dis- 
tributed address space. 

In the depicted embodiment of FIG. 6, a node se- 
lector can select a responsible node for each page, as 
described above, so that the leaf entry 1 56 provides in- 
formation of the address and responsible node of the 
page being located. Accordingly, this directory tracks 
ownership and responsibility lor data, to provide a level 
of indirection between the directory and the physical lo- 
cation of the data. During a memory access operation, 
the memory subsystem 70 passes to the responsible 
node indicated in the leaf entry 156 the address of the 
page being accessed. The shared memory subsystem 
of that node can identify a node that stores a copy of the 
page being accessed, including the owner node. This 
identification of a node having a copy can be performed 
by the RAM copyset or disk copyset of the responsible 
node. The node having a copy stored in its local physical 
memory, such as the owner node, can employ its local 
cache elements, including the local RAM cache and lo- 
cal disk cache to the identify from the global address 
signal a physical location of the data stored in the page 
being accessed. The cache element can employ the op- 
erating system of the owner node to access the memory 
device that maintains that physical location in order that 
the data stored in the page can be accessed. For a read- 
memory operation, or for other similar operations, the 
data read from the physical memory of the owner node 
can be passed via the network to the memory subsys- 
tem of the node requesting the read and subsequently 
stored into the virtual memory space of the requesting 
node for use by that node. 

With reference again to FIG. 6, it can be seen that 
the depicted directory structure 140 comprises a hierar- 
chical structure. To this end, the directory structure 140 
provides a structure that continually subdivides the 
memory space into smaller and smaller sections. Fur- 
ther, each section is represented by directory pages of 
the same structure, but indexes address spaces of dif- 
f r nt sizes. As pages are created or deleted, a linker 
ins rts or deletes the pages from the directory. In one 
mbodiment, the linker is a software module for linking 
data structures. The linker can operate responsive to 
the-address ranges to provide the depicted hierarchical 
structure. Accordingly, the depicted directory 140 pro- 
vides a scaleabl directory for th shar d addr ss 
space. Moreov r, the dir ctory pages are stored in the 
distributed address space and maintained by the distrib- 
uted shared memory syst m. A root for the directory can 



be stored in known locations to allow for bootstrap of the 
system. Consequently, commonly used pages are cop- 
ied and distributed, and rarely used pages are shuffled 
off to disk. Similarly, directory pages will migrate to those 
£ nodes that access them most, providing a degree of self- 
organization that reduces network traffic. 

FIG. 7 depicts the directory of FIG. 6 being em- 
ployed by a system according to the invention. In par- 
ticular FIG. 7 depicts a system 200 that includes two 
nodes, 206a and 206b, a directory structure 140, and a 
pair of local memories having volatile memory devices 
64a and 64b, and persistent memory devices 62a and 
62b. Depicted node 206a includes an address consum- 
er 208a, a global address 210a, and interface 42a, a 
directory manager 44a and a memory controller 46a. 
Node 206b has corresponding elements. The nodes are 
connected by the network 54. A directory 140 having a 
root page, directory pages A-F and pages 1 -5 is further 
depicted. , 

Each node 206a and 206b operates as discussed 
above. The depicted address consumers 208a and 
208b can be an application program, file system, hard- 
ware device or any other such element that requests ac- 
cess to the virtual memory. In operation, the address 
consumers 208a and 208b request an address, or range 
of addresses, and the directory manager can include a 
global address generator that provides the consumer 
with the requested address, or a pointer to the requested 
address. As addresses get generated, the respective di- 
rectory managers 44a and 44b generate directory pag- 
es and store the pages in the directory structure 1 40. As 
depicted, the directory structure 140 tracks the portions 
of the address space being employed by the system 
200, and physical storage for each page is provided 
within the local memories. 

As shown in FIG. 7, the data associated with the 
directory pages are distributive^ stored across the two 
local memories and duplicate copies can exist. As de- 
scribed above and now illustrated in FIG. 7, the data can 
move between different local memories and also move, 
or page, between volatile and persistent storage. The 
data movement can be responsive to data requests 
made by memory users like application programs, or by 
operation of the migration controller described above. 
As also described above, the movement of data be- 
tween different memory locations can occur without re- 
quiring changes to the directory 140. This is achieved 
by providing a directory 140 that is decoupled from the 
physical location of the data by employing a pointer to 
a responsible node that tracks the data storage location. 
Accordingly, although the data storage location can 
change, the responsible node can remain constant, 
thereby avoiding any need to change the directory 140. 

It will b understood to thos of ordinary skill in th 
art that c rtain modification, additions, and subtractions 
can b made to th mbodim nts described abov with- 
out departing from the spirit and scope of the invention. 
Accordingly, the invention described above is not to be 
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controller 46 of that nod and pass to that nod 's mem- 
ory controller the memory request provided by the mem- 
ory interface 42. Accordingly, the depicted directory 
manager 44 is responsible for managing a directory 
structure that identifies for each page of the shared s 
memory space a responsible node that tracks the phys- 
ical location of the data stored in the respective page. 
Thus, the directory, rather than directly providing the lo- 
cation of the page, can optionally identify a responsible 
node, or other device, that tracks the location of the 10 
page. This indirection facilitates maintenance of the di- 
rectory as pages migrate between nodes. 

The memory controller 46 performs the low level 
memory access functions that physically store data 
within the memory elements connected to the network, is 
In the depicted embodiment, the directory manager 44 
of a first node can pass a memory access request 
through the interface 42, to the network module of the 
OS 1 6, and across the network 54 to a second node that 
the directory manager 44 identifies as the responsible 
node for the given address. The directory manager 44 
can then query the responsible node to determine the 
attributes and the current owner node of the memory 
page that is associated with the respective global ad- 
dress. The owner of the respective page is the network 
node that has control over the memory storage element 
on which the data of the associated page is stored. The 
memory controller 46 of the owner can access, through 
the OS 16 of that node or through any interface, the 
memory of the owner node to access the data of the 
page that is physically stored on that owner node. 

In particular, as depicted in FIG. 3, the directory 
manager 44 couples to the network module 52 which 
couples to the network 54. The directory manager can 
transmit to the network module 52 a command and as- 
sociated data that directs the network interlace 52 to 
pass a data signal to the owner node. The owner node 
receives the memory request across network 54 and 
through network module 52 that passes the memory re- 
quest to the interface 42 of that owner node. The inter- 
face 42 couples to the memory controller 46 and can 
pass the memory request to the local memory controller 
of that owner node for operating the local storage ele- 
ments, such as the disk or RAM elements, to perform 
the requested memory operation. 

Once the owner node has performed the requested 
memory operation, such as reading a page of data, the 
memory subsystem 40 of the owner node can then 
transfer the page of data, or a copy of the page of data, 
via the network 54 to the node that originally requested 
access to that portion of the shared memory. The page 
of data is transferred via the network 54 to the network 
module 52 of the requesting node and the shared mem- 
ory subsyst m 40 op rates th memory controller 46 to 
stor in th local memory of the r qu sting nod a copy 
of the acc ssed data. 

Accordingly, in one embodiment of th invention, 
when a first node accesses a page of th shared mem- 
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ory space which is not stored locally on that node, th 
directory manager 44 identifies a nod that has a copy 
of the data stored in that page and moves a copy of that 
data into the local memory of the requesting node. The 
local memory storage, both volatile and persistent, of 
the requesting node therefore becomes a cache for pag- 
es that have been requested by that local node. This 
embodiment is depicted FIG. 3 which depicts a memory 
controller that has a local disk cache controller 48 and 
a local RAM cache controller 50. Both of these local 
cache controllers can provide to the operating system 
16, or other consumer pages of the shared memory 
space that are cache stored in the local memory of the 
node, including local persistent memory and local vola- 
tile memory. 

The shared memory subsystem can include a co- 
herent replication controller that maintains coherency 
between cached pages by employing a coherence 
through invalidation process, a coherence through mi- 
gration process or other coherence process suitable for 
practice with the present invention. The coherent repli- 
cation controller can automatically generate a copy of 
the data stored in each page and can store the copy in 
a memory device that is separate from the memory de- 
vice of the original copy. This provides for fault tolerant 
operation, as the failure of any one memory device will 
not result in the loss of data. The coherent replication 
controller can be a software model that monitors all cop- 
ies of pages kept in volatile memory and made available 
for writing. The controller can employ any of the coher- 
ency techniques named above, and can store tables of 
location information that identifies the location informa- 
tion for all generated copies. 

FIG. 4 illustrates in greater detail one embodiment 
of a shared memory subsystem according to the inven- 
tion. The shared memory subsystem 70 depicted in FIG. 
4 includes a remote operations element 74, a local RAM 
cache 76, a RAM copyset 78, a global RAM directory 
80, a disk copyset 82, a global disk directory 84, a con- 
figuration manager 88, a policy element 90, and a local 
disk cache 94. FIG. 4 further depicts a network element 
104, a physical memory 100, shared data element 102, 
a physical file system 98, which is part of the operating 
system 16, a configuration service 108, a diagnostic 
service 1 1 0, and a memory access request 1 1 2. The de- 
picted subsystem 70 can be a computer program that 
couples to the physical memory, file system, and net- 
work system of the host node, or can be electrical circuit 
card assemblies that interface to the host node, or can 
be a combination of programs and circuit card assem- 
blies. 

The flow scheduler 72 depicted in FIG. 4 can or- 
chestrate the controls provided by an API of the subsys- 
tem 70. In one mbodiment, the flow scheduler 72 can 
b a stat machine that monitors and r sponds to th 
requests 11 2 and r mote requests through network 1 04 
which can be instructions for memory operations and 
which can include signals representative of the global 
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addresses being op rated on. These m mory operation 
requ sts 112 can act as op-codes for primitive opera- 
tions on one or more global addresses. They can be 
read and write requests, or other memory operations. 
Alternatively, the flow scheduler 72 can be a program, 
such as an interpreter, that provides an execution envi- 
ronment and can map these op-codes into control flow 
programs called applets. The applets can be independ- 
ent executable programs that employ both environment 
services, such as threading, synchronization, and buffer 
management and the elements depicted in FIG. 4. The 
API is capable of being called from both external clients, 
like a distributed shared memory file system, as well as 
recursively by the applets and the other elements 74-94 
of the subsystem 70. Each element can provide a level 
of encapsulation to the management of a particular re- 
source or aspect of the system. To this end, each ele- 
ment can export an API consisting of functions to be em- 
ployed by the applets. This structure is illustrated in FIG. 
4. Accordingly, the flow scheduler 72 can provide an en- 
vironment to load and execute applets. The applets are 
dispatched by the flow scheduler 72 on a per op-code 
basis and can perform the control flow for sequential or 
parallel execution of an element to implement the op- 
code on the specified global address, such as a read or 
write operation. Optionally, the flow scheduler 72 can 
include an element to change dynamically the applet at 
run time as well as execute applets in parallel and in 
interpreted mode. 

The depicted shared memory subsystem 70 in- 
cludes a bifurcated directory manager that includes the 
global RAM directory 80 and the global disk directory 
84. The global RAM directory 80 is a directory manager 
that tracks information that can provide the location of 
pages that are stored in the volatile memory, typically 
RAM, of the network nodes. The global disk directory 
84 is a global disk directory manager that manages a 
directory structure that tracks information that can pro- 
vide the location of pages that are stored on persistent 
memory devices. Together, the global RAM directory 80 
and the global disk directory 84 provide the shared 
memory subsystem 70 with integrated directory man- 
agement for pages that are stored in persistent storage 
and volatile memory. 

In one embodiment a paging element can operate 
the RAM and disk directory managers to remap portions 
of the addressable memory space between one of the 
volatile memories and one of the persistent memories. 
In the shared memory system, this allows the paging 
lement to remap pages from the volatile memory of one 
node to a disk memory of another node. Accordingly, 
the RAM directory manager passes control of that page 
to the disk directory manager which can then treat the 
page as any other page of data. This allows for improved 
load balancing, by r moving data from RAM memory, 
and storing it in th disk d vie s, under the control of 
the disk dir ctory manager. 

The local memory controller of the subsystem 70 is 
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provided by the local RAM each 76 and th local disk 
cache 94. The local RAM cache 76 which couples to the 
physical memory 100 of the local node can access, as 
described above, the virtual memory space of the local 

5 node to access data that is physically stored within the 
RAM memory 100. Similarly, the local disk cache 94 
couples to the persistent storage device 98 and can ac- 
cess a physical location that maintains in the local per- 
sistent storage data of the distributed shared memory. 

10 FIG. 4 also depicts a remote operations element 74 
that couples between the network 104 and the flow 
scheduler 72. The remote operations element 74 nego- 
tiates the transfer of data across the network 104 for 
moving portions of the data stored in the shared memory 

is space between the nodes of the network. The remote 
operations element 74 can also request services from 
remote peers, i.e. invalidate to help maintain coherency 
or for other reasons. 

FIG. 4 also depicts a policy element 90 that can be 

20 a software module that acts as a controller to determine 
the availability of resources, such as printer capabilities, 
hard-disk space, available RAM and other such resourc- 
es. The policy controller can employ any of the suitable 
heuristics to direct the elements, such as the paging 

25 controller, disk directory manager, and other elements 
to dynamically distribute the available resources. 

FIG. 4 further depicts a memory subsystem 70 that 
includes a RAM copyset 78 and a disk copyset 82. 
These copy sets can manage copies of pages that are 

30 cached at a single node. The disk copyset 82 can main- 
tain information on copies of pages that are stored in the 
local disk cache, which can be the local persistent mem- 
ory. Similarly, the RAM copyset 78 can maintain infor- 
mation on copies of pages that are stored in the local 

35 RAM cache which can be the local RAM. These cop- 
ysets encapsulate indexing and storage of copyset data 
that can be employed by applets or other executing code 
for purposes of maintaining the coherency of data stored 
in the shared memory space. The copyset elements can 

40 maintain copyset data that identifies the pages cached 
by the host node. Further, the copyset can identify the 
other nodes on the network that maintain a copy of that 
page, and can further identify for each page which of 
these nodes is the owner node, wherein the owner node 

45 can be a node which has write privileges to the page 
being accessed. The copysets themselves can be 
stored in pages of the distributed shared memory space. 

The local RAM cache 76 provides storage for mem- 
ory pages and their attributes. In one embodiment, the 

50 local RAM cache 76 provides a global address index for 
accessing the cached pages of the distributed memory 
and the attributes based on that page. In this embodi- 
ment, the local ram cache 76 provides the index by stor- 
ing in memory a list of each global address cached in 

55 the local RAM. With each list d global address, th in- 
dex provid s a point r into a buff r memory and to the 
location of the page data. Optionally, with each listed 
global address, th index can further provide attribute 
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employ d by applications wh n communicating be- 
tween applications running on the same machine. 
These techniques can employ object linking and em- 
bedding, dynamic link libraries, class registering, and 
other such techniques. Accordingly, the nodes 12 can 
employ the virtual shared memory 22 to exchange data 
and objects between application programs running on 
the different nodes 12 of the network 10. 

In the embodiment depicted in FIG. 1 t each node 
12 can be a conventional computer system such as a 
commercially available IBM PC compatible computer 
system. The processor 14 can be any processor unit 
suitable for performing the data processing for that com- 
puter system. The operating system 1 6 can be any com- 
mercially available or proprietary operating system that 
includes, or can access, functions for accessing the lo- 
cal memory of the computer system and networking. 

The private memory device 18 can be any computer 
memory device suitable for storing data signals repre- 
sentative of computer readable information. The private 
memory provides the node with local storage that can 
be kept inaccessible to the other nodes on the network. 
Typically the private memory device 18 includes a RAM, 
or a portion of a RAM memory, for temporarily storing 
data and application programs and for providing the 
processor 14 with memory storage for executing pro- 
grams. The private memory device 18 can also include 
persistent memory storage, typically a hard disk unit or 
a portion of a hard disk unit, for the persistent storage 
of data. 

The shared memory subsystem 20 depicted in FIG. 
1 is an embodiment of the invention that couples be- 
tween the operating system 16 and the virtual shared 
memory 22 and forms an interface between the operat- 
ing system 16 and the virtual shared memory to allow 
the operating system 16 to access the virtual shared 
memory 22. The depicted shared memory subsystem 
20 is a software module that operates as a stand-alone 
distributed shared memory engine. The depicted sys- 
tem is illustrative and other systems of the invention can 
be realized as shared memory subsystems that can be 
embedded into an application program, or be imple- 
mented as an embedded code of a hardware device. 
Other such applications can be practiced without de- 
parting from the scope of the invention. 

The depicted virtual shared memory 22 illustrates a 
virtual shared memory that is accessible by each of the 
nodes 12a-12c via the shared memory subsystem 20. 
The virtual shared memory 22 can map to devices that 
provide physical storage for computer readable data, 
depicted in FIG. 1 as a plurality of pages 24a-24d. In 
on embodiment, the pages form portions of the shared 
memory space and divide the address space of the 
shared memory into pag addr ssabl m mory spaces. 
For exampl the address space can b pag d into 4K 
byt s ctions. In other embodim nts alt mative granu- 
larity can be employed to manager th shared memory 
space. Each node 1 2a-1 2c through th shared memory 



subsystem 20 can access each pag 24a-24d stored in 
the virtual shared memory 22. Each page 24a-24d rep- 
resents a unique entry of computer data stored within 
the virtual shared memory 22. Each page 24a-24d is ac- 

s cessible to each one of the nodes 1 2a-1 2c, and alterna- 
tively, each node can store additional pages of data with- 
in the virtual shared memory 22. Each newly stored 
page of data can be accessible to each of the other 
nodes 12a-12c. Accordingly, the virtual shared memory 

10 22 provides a system for sharing and communicating 
data between each node 1 2 of the computer network 1 0. 

FIG. 2 illustrates in functional block diagram form a 
computer network 30 that has a distributed shared 
memory. In this embodiment, each node 12a-12c has a 

15 memory subsystem 32 that connects between the op- 
erating system 16 and the two local memory devices, 
the RAM 34 and the disk 36, and that further couples to 
a network 38 that couples to each of the depicted nodes 
12a, 12b and 12c and to a network memory device 26. 

20 More particularly, PIG. 2 illustrates a distributed 
shared memory network 30 that includes a plurality. of 
nodes 12a, 12b and 12c, each including a processing 
unit 14, an operating system 16, a memory subsystem 
32, a RAM 34, and a disk 36. FIG. 2 further depicts a 

25 computer network system 38 that connects between the 
nodes 12a, 12b and 12c and the network memory de- 
vice 26. The network 38 provides a network communi- 
cation system across these elements. 

The illustrated memory subsystems 32a-32c that 

30 connect between the operating system 16a-l6c, the 
memory elements 34a-34c, 36a-36c, and the network 
38, encapsulate the local memories of each of the nodes 
to provide an abstraction of a shared virtual memory 
system that spans across each of the nodes 12a, 12b 

35 and 1 2c on the network 38. The memory subsystems 
32a-32c can be software modules that act as distribu- 
tors to map portions of the addressable memory space 
across the depicted memory devices. The memory sub- 
systems further track the data stored in the local mem- 

40 ory of each node 12 and further operate network con- 
nections with network 38 for transferring data between 
the nodes 1 2a- 1 2c. In this way, the memory subsystems 
32a-32c access and control each memory element oil 
the network 38 to perform memory access operations 

45 that are transparent to the operating system 1 6. Accord- 
ingly, the operating system 16 interfaces with the mem- 
ory subsystem 32 as an interface to a global memory 
space that spans each node 1 2a- 1 2c on the network 38. 
FIG. 2 further depicts that the system 30 provides 

50 a distributed shared memory that includes persistent 
storage for portions of the distributed memory In partic- 
ular, the depicted embodiment includes a memory sub- 
system, such as subsystem 32a, that interfaces to a per- 
sistent m mory d vice, d picted as th disk 36a. The 

55 subsyst m 32a can ope rat the persist nt m mory de- 
vice to provide persistent storage for portions of th dis- 
tributed shar d memory space. As illustrated, each per- 
sistent memory device 36 depicted in FIG. 2 has a por- 
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tion of the addressable memory space mapp d onto it. 
For example, device 36a has th portions of the ad- 
dressable memory space, C 0 , C d , C g , mapped onto it, 
and provides persistent storage for data signals stored 
in those ranges of addresses. 

Accordingly, the subsystem 32a can provide inte- 
grated control of persistent storage devices and elec- 
tronic memory to allow the distributed shared memory 
space to span across both types of storage devices, and 
to albw portions of the distributed shared memory to 
move between persistent and electronic memory de- 
pending on predetermined conditions, such as recent 
usage. 

In one optional embodiment, the nodes of the net- 
work are organized into a hierarchy of groups. In this 
embodiment, the memory subsystems 32a-32c can in- 
clude a hierarchy manager that provides hierarchical 
control for the distribution of data. This includes control- 
ling the migration controller, and policy controller, which 
are discussed in detail below, to perform hierarchical da- 
ta migration and load balancing, such that data migrates 
primarily, between computers of the same group, and 
passes to other groups in hierarchical order. Resource 
distribution is similarly managed. 

FIG. 3 illustrates in more detail one shared memory 
subsystem 40 according to the invention. FIG. 3 depicts 
a shared memory subsystem 40, that includes an inter- 
fac 42, a DSM directory manager 44, a memory con- 
troller 46, a local disk cache controller 48, and local RAM 
cache controller 50. FIG. 3 further depicts the network 
54, an optional consumer of the DSM system, depicted 
as the service 58, the operating system 16, a disk driver 
60, a disk element 62 and a RAM element 64. 

The shared memory subsystem 40 depicted in FIG. 
3 can encapsulate the memory management operations 
of the network node 1 2 to provide a virtual shared mem- 
ory that can span across each node that connects into 
the network 54. Accordingly, each local node 1 2 views 
the network as a set of nodes that are each connected 
to a large shared computer memory. 

The depicted interface 42 provides an entry point 
for the local node to access the shared memory space- 
of the computer network. The interface 42 can couple 
directly to the operating system 1 6, to a distributed serv- 
ice utility such as the depicted DSM file system 58, to a 
distributed user-level service utility, or alternatively to 
any combination thereof. 

The depicted interface 42 provides an API that is a 
memory oriented API. Thus, the illustrated interface 42 
can export a set of interfaces that provide low-level con- 
trol of the distributed memory. As illustrated in FIG. 3, 
the interface 42 exports the API to the operating system 
16 or to the optional DSM service 58. The operating sys- 
t m 1 6 or the service employs the interface 42 to request 
standard memory managem nt techniques, such as 
reading and writing from portions of the memory space. 
Thes portions of the memory space can be the pages 
as described above which can be 4K byte portions of 



the shared memory spac , or oth r units of memory, 
such as objects or segments. Each page can be located 
within the shared memory space which is designated by 
a global address signal for that page of memory. The 

s system can receive address signals from an application 
program or, optionally, can include a global address 
generator that generates the address signals. The ad- 
dress generator can include a spanning module that 
generates address signals for a memory space that 

10 spans the storage capacity of the network. 

Accordingly, in one embodiment, the interface 42 
receives requests to manipulate pages of the shared 
memory space. To this end, the interface 42 can com- 
prise a software module that includes a library of func- 

15 tions that can be called by services, the OS 1 6, or other 
caller, or device. The function calls provide the OS 16 
with an API of high level memory oriented services, such 
as read data, write data, and allocate memory. The im- 
plementation of the functions can include a set of calls 

20 to controls that operate the directory manager 44, and 
the local memory controller 46. Accordingly, the inter- 
face 42 can be a set of high level memory function calls 
to interface to the low-level functional elements of 
shared memory subsystem 40. 

25 FIG. 3 further depicts a DSM directory manager 44 
that couples to the interface 42. The interface 42 passes 
request signals that represent requests to implement 
memory operations such as allocating a portion of mem- 
ory, locking a portion of memory, mapping a portion of 

30 memory, or some other such memory function. The di- 
rectory manager 44 manages a directory that can in- 
clude mappings than can span across each memory de- 
vice connected to the network 38 depicted in FIG. 2, in- 
cluding each RAM and disk element accessible by the 

55 network. The directory manager 44 stores a global di- 
rectory structure that provides a map of the global ad- 
dress space. In one embodiment as will be explained in 
greater detail hereinafter, the directory manager 44 pro- 
vides a global directory that maps between global ad- 

40 dress signals and responsible nodes on the network. A 
responsible node stores information regarding the loca- 
tion and attributes of data associated with a respective 
global address, and optionally stores a copy of that 
page's data. Consequently, the directory manager 44 

45 tracks information for accessing any address location 
within the virtual address space. 

The control of the distributed shared memory can 
be coordinated by the directory manager 44 and the 
memory controller 46. The directory manager 44 main- 

50 tains a directory structure that can operate on a global 
address received from the interface 42 and identify, for 
that address, anode on the network that is responsible 
for maintaining the page associated with that address 
of the shared memory space. Once the directory man- 

55 ager 44 identifies which node is responsible for main- 
taining a particular address, the dir ctory manager 44 
can identity a node that stores information for locating 
a copy of the page, and make the call to the memory 
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work that carries data signals r pres ntative of compu- 
t r readable information a persistent memory device 
that couples to the data network and that provides per- 
sistent data storage, and plural computers that each 
have an interface that couples to the data network, for 
accessing the data network to exchange data signals 
therewith. Moreover, each of the computers can include 
a shared memory subsystem for mapping a portion of 
the addressable memory space to a portion of the per- 
sistent storage to provide addressable persistent stor- 
age for data signals. 

In a system that distributes the storage across the 
memory devices of the network, the persistent memory 
device will be understood to include a plurality of local 
persistent memory devices that each couple to a re- 
spective one of the plural computers. To this same end, 
the system can also include a distributor for mapping 
portions of the addressable memory space across the 
plurality of local persistent memory devices and a disk 
directory manager, for tracking the mapped portions of 
said addressable memory space to provide information 
r presentative of the local persistent memory device 
that stores that portion of said addressable memory 
space mapped thereon. 

The systems can also include a cache system for 
operating one of the local persistent memory devices as 
a cache memory for cache storing data signals associ- 
ated with recently accessed portions of the addressable 
memory space. Further the system can include a migra- 
tion controller for selectively moving portions of the ad- 
dressable memory space between the local persistent 
memory devices of the plural computers. The migration 
controller can determine and respond to data access 
patterns, resource demands or any other criteria or heu- 
ristic suitable for practice with the invention. According- 
ly, the migration controller can balance the loads on the 
network, and move data to nodes from which it is com- 
monly accessed. The cache controller can be a software 
program running on a host computer to provide a soft- 
ware managed RAM and disk cache. The RAM can be 
any volatile memory including SRAM, DRAM or any oth- 
er volatile memory. The disk can be any persistent mem- 
ory including any disk, RAID, tape or other device that 
provides persistent data storage. 

The systems can also include a coherent replication 
controller for generating a copy, or select number of cop- 
ies, of a portion of the addressable memory space main- 
tained in the local persistent memory device of a first 
computer and for storing the copy in the local persistent 
memory device of a second computer. The coherent 
r plication controller can maintain the coherency of the 
copies to provide coherent data replication. 

The systems can also be understood to provide in- 
tegral d control of data stored in volatil memory and 
in p rsistent m mory. In such syst ms a volatile mem- 
ory d vie has volatile storage for data signals, and the 
shared m mory subsystem includes an element, typi- 
cally a software module, for mapping a portion of the 



addressable memory spac to a portion of the volatile 
storage. In these systems the volatile memory device 
can be comprised of a plurality of local volatile memory 
devices each coupled to a respective one of the plural 
5 computers, and the persistent memory device can be 
comprised of a plurality of local persistent memory de- 
vices each coupled to a respective one of the plural com- 
puters. 

In these systems, a directory manager can track the 

io mapped portions of the addressable memory space, 
and can include two sub-components; a disk directory 
manager for tracking portions of the addressable mem- 
ory space mapped to the local persistent memory de- 
vices, and a RAM directory manager for tracking por- 

15 tions of the addressable memory space mapped to the 
local volatile memory devices. Optionally, a RAM cache 
system can operate one of the local volatile memory de- 
vices as a cache memory for cache storing data signals 
associated with recently accessed portions of the ad- 

20 dressable memory space. 

The systems can include additional elements in- 
cluding a paging element for remapping a portion of the 
addressable memory space between one of the local 
volatile memory devices and one of the local persistent 

25 memory devices; a policy controller for determining a 
resource available signal representative of storage 
available on each of the plural computers and, a paging 
-element that remaps the portion of addressable memory 
space from a memory device of a first computer to a 

30 memory device of a second computer, responsive to the 
resource available signal; and a migration controller for 
moving portions of addressable memory space between 
the local volatile memory devices of the plural comput- 
ers. 

35 Optionally, the systems can include a hierarchy 
manager for organizing the plural computers into a set 
of hierarchical groups wherein each group includes at 
least one of the plural computers. Each the group can 
include a group memory manager for migrating portions 

40 of addressable memory space as a function of the hier- 
archical groups. 

The system can maintain coherency between cop- 
ied portions of the memory space by including a coher- 
ent replication controller for generating a coherent copy 

45 of a portion of addressable memory space. 

The system can generate or receive global address 
signals. Accordingly, the systems can include an ad- 
dress generator for generating a global address signal 
representative of a portion of addressable memory 

50 space. The address generator can include a spanning 
unit for generating global address signals as a function 
of a storage capacity associated with the persistent 
memory devices, to provide global address signals ca- 
pable of logically addressing th storage capacity of the 

55 persistent memory devices. 

In distributed systems, th directory manag r can 
be a distributed directory manager for storing within the 
distributed m mory space, a directory signal represent- 
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ative of a storage location of a portion of the addr ssable 
memory space. The distributed directory manager can 
include a directory page generator for allocating a por- 
tion of the addressable memory space and for storing 
therein an entry signal representative of a portion of the $ 
directory signal. The directory page generator optionally 
includes a range generator for generating a range signal 
representative of a portion of the addressable memory 
space, and for generating the entry signal responsive to 
the range signal, to provide an entry signal representa- 
tive of a portion of the directory signal that corresponds 
to the portion of the addressable memory space. More- 
over, the distributed directory manager can include a 
linking system for linking the directory pages to form a 
hierarchical data structure of the linked directory pages 
as well as a range linking system for linking the directory 
pages, as a function of the range signal, to form a hier- 
archical data structure of linked directory pages. 

As the data stored by the system can be homeless, 
in thai the data has no fixed physical home, but can mi- 
grate, as resources and other factors dictate, between 
the memory devices of the network, a computer system 
according to the invention can include a directory page 
generator that has a node selector for generating a re- 
sponsible nodo signal representative of a select one of 
the plural computers having location information for a 
portion of the shared address space. This provides a 
I v I of indirection that decouples the directory from the 
physical storage location of the data. Accordingly, the 
directory needs only to identify the node, or other device, 
that tracks the physical location of the data. This way, 
each time data migrates between physical storage lo- 
cations, the directory does not have to be updated, since 
the node tracking the location of the data has not 
changed and still provides the physical location informa- 
tion. 

Accordingly, the system can include page genera- 
tors that generate directory pages that carry information 
representative of a location monitor, such as a respon- 
sible computer node : that tracks a data storage location, 
to provide a directory structure for tracking homeless da- 
ta. Moreover, the directory itself can be stored as pages 
within the virtual memory space. Therefore, the data 
storage location can store information representative of 
a director page, to store the directory structure as pages 
of homeless data. 

In another aspect, the invention can be understood 
as methods for providing a computer system having a 
shared addressable memory space. The method can in- 
clude the steps of providing a network for carrying data 
signals representative of computer readable informa- 
tion, providing a hard-disk, coupled to the network, and 
having persistent storage for data signals, providing plu- 
ral computers, each having an interface, coupled to the 
data network, for exchanging data signals b tween the 
plural com put rs, and assigning a portion of the ad- 
dressable memory space to a portion of the persistent 
storage of the hard disk to provide addressable persist- 
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ent storage for data signals. 

Thus, at least in its preferred embodiments, the in- 
vention to provides an improved computer system. 

Furthermore, at least in its preferred embodiments, 
the invention provides computer network systems that 
have adaptable system configurations for dynamically 
exploiting distributed resources and thereby increasing 
network productivity. 

Moreover, at least in its preferred embodiments, the 
invention provides computer network systems that elim- 
inate the need for application programs to provide low- 
level memory management services across two or more 
distributed nodes. 

Yet further, the invention, at least in its preferred em- 
bodiments, provides computer network systems that 
have improved fault tolerance and that are more readily 
scaleable for adding additional workstations as well as 
for the interconnection of two or more networks. 

Some embodiments of the invention will now be de- 
scribed by way of example only and with reference to 
the accompanying drawings. in which: 

FIG. 1 illustrates a distributed shared memory com- 
puter network according to the invention; 
FIG. 2 is a functional block diagram that illustrates 
in more detail one distributed shared memory com- 
puter network of the type shown in FIG. 1 ; 
FiG. 3 illustrates in more detail a shared memory 
subsystem suitable for practice with the network il- 
lustrated in FIG. 2; 

FIG. 4 is a functional block diagram of one shared 
memory subsystem according to the invention; 
FIG. 5 illustrates a directory page that can be pro- 
vided by a shared memory subsystem of the type 
depicted in FIG. 4; 

FIG. 6 illustrates a directory that can be distributed 
within a shared memory and formed of directory 
pages of the type illustrated in FIG. 5; and 
FIG. 7 illustrates in functional block diagram form a 
system of the invention that employs a directory ac- 
cording to FIG. 6 for tracking portions of a distribut- 
ed shared memory. 

FIG. 1 illustrates a computer network 10 that pro- 
vides a shared memory that spans the memory space 
of each node of the depicted computer network 1 0. 

Specifically, FIG. 1 illustrates a computer network 
1 0 that includes a plurality of nodes 1 2a-1 2c, each hav- 
ing a CPU 14, an operating system 16, an optional pri- 
vate memory device 18, and a shared memory subsys- 
tem 20. As further depicted in by FIG. 1 , each node 1 2a- 
12c connects via the shared memory subsystem 20 to 
a virtual shared memory 22. As will be explained in 
greater detail hereinafter, by providing the shared mem- 
ory subsyst m 20 that allows the node 12a-12c to ac- 
c ss the virtual shared memory 22, th comput r net- 
work 10 enables network nodes 12a-12c to communi- 
cate and share functionality using th same techniques 
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(57) Distributed shared memory systems and proc- 
esses that can connect into each node of a computer 
network to encapsulate the memory management oper- 
ations of the connected nodes and to provide thereby 
an abstraction of a shared virtual memory that can span 
across each node of the network-and that optionally 
spans across each memory device connected to the 
computer network. Accordingly, each node on the net- 
work having the distributed shared memory system of 
the invention can access the shared memory. 
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Des ription 

The invention relates to computer systems, and 
more particularly, to computer networking systems and 
methods that provide shared memory systems and 
services. 

The conventional computer network includes a 
number of client computers connected together and fur- 
ther connected to a server computer that stores the data 
and the programs that client computers employ during 
network operation. This configuration is generally re- 
ferred to as a client-server network. Typically, each cli- 
ent is a conventional computer system that includes a 
private main memory, typically a RAM memory, and a 
persistent storage, typically a hard disk. The server is 
usually an expensive high end machine that includes a 
high speed processor unit and a large memory, often 
having ten to one hundred times more storage than the 
individual client computers. The clients and server co- 
op rate to share data and services among the different 
users, and to thereby make the individual computers ap- 
pear as a unified distributed system. 

To this end, the server acts as a central controller 
that provides through its large memory a central repos- 
itory of network data : and that distributes services to the 
individual client computers, generally on an as-available 
basis. Typically, these services are provided by means 
of specialized software running on a high speed proc- 
ssor. 

Although computer networks based on this client- 
server model have generally been successful at provid- 
ing users with necessary computer services, as the user 
demands on computer systems have increased, the 
weaknesses in the client-server network are beginning 
to place limits on the services that can be provided. 

An additional problem with the client-server network 
is that it provides a static operating environment that is 
set for optimal performance at a certain level of network 
activity. Consequently, the client-server network fails to 
exploit available resources to improve system perform- 
ance. In particular, as the system activity rises above or 
drops below the expected level of network activity, the 
static operating environment lacks any ability to recon- 
figure dynamically the allocation of network resources 
to one providing better performance for the present level 
of activity. 

Moreover, the client-server computer network re- 
quires that computer programs written to operate on the 
cli nt-server network distribute themselves between di- 
nts and the server. This requires that the application 
programs implement a set of functions that divide the 
program between the clients and the server. This distri- 
bution of the application programs requires that the cli- 
nt-server application programs be quite complex. For 
example, a cli nt-serv r computer program that shar s 
data betw en differ nt machines must includ function- 
ality that allows for the distribution of multiple copies of 
data files, the maintenance of coherency for the distrib- 



ut d copies, and other such low-level managem nt 
servic s. 

Further troubling is that the client-server network 
stores all important applications and data files in the 
5 memory of the server system. Consequently, the client- 
server network is subject to complete system failure 
each time the server system crashes. 

For the above reasons, among others, the present 
client -server computer architecture fails to provide an 
adequate response to the increased demands of today's 
computer users. 

The invention provides systems that can create and 
manage a virtual memory space that can be shared by 
each computer on a network and can span the storage 
space of each memory device connected to the network. 
Accordingly all data stored on the network.can be 
stored within the virtual memory space and the actual 
physical location of the data can be in any of the memory 
devices connected to the network. 

More specifically, the system can create or receive, 
a global address signal that represents a portion, for ex- 
ample 4k bytes, of the virtual memory space. The global 
address signal can be decoupled from, i.e. unrelated to, 
the physical and virtual address spaces of the underly- 
ing computer hardware, to provide support for a memory 
space large enough to span each volatile and persistent 
memory device connected to the system. For example, 
systems of the invention can operate on 32-bit comput- 
ers, but can employ global address signals that can be 
128 bits wide. Accordingly, the virtual memory space 
spans 2 128 bytes, which is much larger than the 2 s2 ad- 
dress space supported by the underlying computer 
hardware. Such an address space can be large enough 
to provide a separate address for every byte of data stor- 
age on the network, including all RAM, disk and tape 
storage. 

For such a large virtual memory space, typically on- 
ly a small portion is storing data at any time. Accordingly, 
the system includes a directory manager that tracks 
those portions of the virtual memory space that are in 
use. The system provides physical memory storage for 
each portion of the virtual memory space in use by map- 
ping each such portion to a physical memory device, 
such as a RAM memory or a hard-drive. Optionally, the 
mapping includes a level of indirection that facilitates da- 
ta migration, fault-tolerant operation, and load balanc- 
ing. 

By allowing each computer to monitor and track 
which portions of the virtual memory space are in use, 
each computer can share the memory space. This al- 
lows the networked computers to appear to have a sin- 
gle memory, and therefore can allow application pro- 
grams running on different computers to communicate 
using t chniqu s curr ntly mployed to communicate 
betw en applications running on the sam machine. 

In one aspect, th invention can be und rstood to 
includ computer systems having a shar d addressable 
memory space. The systems can comprise a data net- 
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